[MOD-8206] INT8 flow tests #573

meiravgri · 2024-12-19T11:38:20Z

This pull request introduces flow tests for VecSimType_INT8 data type.

Generalized Test Classes

Implemented GeneralTest class for running common tests on BF and HNSW indices. Currently only used by TestINT8.
Created TestINT8 class, inherits from GeneralTest

Bindings support

Added support for VecSimType_INT8 in the PyHNSWLibIndex:
- HNSWIndex::saveIndex
- HNSWIndex::checkIntegrity

New Helper Functions

Introduced create_flat_index and create_add_vectors functions to facilitate the creation and addition of vectors to an index.
Added a function create_int8_vectors to generate INT8 vectors for testing purposes.
fp32_expand_and_calc_cosine_dist takes 2 numpy arrays and converts them to np.float32 to avoid overflow in distance calculations.

Additional Improvements

Updated IndexCtx class to include type_to_dtype mapping for various data types to their numpy type.
Modified __init__ method in IndexCtx to accept a create_data_func parameter, allowing custom data creation functions to be passed.

codecov · 2024-12-19T12:06:38Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.05%. Comparing base (d17629c) to head (9ac3449).
Report is 1 commits behind head on meiravg_feature_int_uint_8.

Additional details and impacted files

@@                      Coverage Diff                       @@
##           meiravg_feature_int_uint_8     #573      +/-   ##
==============================================================
+ Coverage                       96.95%   97.05%   +0.09%     
==============================================================
  Files                             103      104       +1     
  Lines                            5450     5496      +46     
==============================================================
+ Hits                             5284     5334      +50     
+ Misses                            166      162       -4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

introduce GeneralTest call from TestINT8 common.py: introduce create_flat_index create_add_vectors move fp32_expand_and_calc_cosine_dist to common.py

* add optional create_data_func to IndexCtx, use for special datatypes *inntroduce test_create_int8 and test_search_insert_int8 create_int8_vectors expectes shape (tuple)

fix float16 calling query.flat

revert test_bf_float16_range_query change

GuyAv46

@alonre24 We should refactor our flow tests, to be more generic, and also remove irrelevant tests (or move to a test_bindings.py file if all they do is to check the bindings)

GuyAv46 · 2024-12-23T15:08:51Z

tests/flow/common.py

+
+    return BFIndex(bfparams)
+
+def create_add_vectors(index, vectors):


why is it called "create"?

It creates also the list of (key, vector) tuples

* [MOD-8198] Introduce INT8 distance functions (#560) * naive implementation of L2 * update * implment naive disatnce for int8 add cosine to spaces fix typos in calculator * imp choose L2 int8 with 256bit loop add spaces unit tests for int8 L2 add compilation flags introduce tests/utils for general utils * imp space bm for int8 change INITIALIZE_BENCHMARKS_SET to INITIALIZE_BENCHMARKS_SET_L2_IP introduce INITIALIZE_BENCHMARKS_SET_COSINE fix typos in Choose_INT8_L2_implementation_AVX512F_BW_VL_VNNI name * fix INITIALIZE_BENCHMARKS_SET_L2_IP and add include to F_BW_VL_VNNI * rename unit/test_utuils to unit_test_utils * seed create vec * format * implmenet IP + unit test * ip bm * format * implement cosine in ip API change create_int8_vec to populate_int8_vec add compute norm * use mask sub instead of msk load * loop size = 512 minimal dim = 32 * add int8 to bm * reanme to simd64 * convert to int before multiplication * review comments: align to vector size ncluding the norm in cosine dist unit test cover small dim in cosine chooser * use sizeof(float)instead of 4 * remove int conversion in test_utils::compute_norm * REVERT!!! malicious test to see if we get to the code * assert dummt * fix alignemnt test * remove assert * remove cosine alignment * Override missing intrinsincs in gcc <11 (#572) * override _mm256_loadu_epi8 with mm256_maskz_loadu_epi8 if gcc < 11 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95483 * fix * disable flow temp * add comment * [MOD-8200] [MOD-8202] INT8 index (#566) * naive implementation of L2 * update * implment naive disatnce for int8 add cosine to spaces fix typos in calculator * imp choose L2 int8 with 256bit loop add spaces unit tests for int8 L2 add compilation flags introduce tests/utils for general utils * imp space bm for int8 change INITIALIZE_BENCHMARKS_SET to INITIALIZE_BENCHMARKS_SET_L2_IP introduce INITIALIZE_BENCHMARKS_SET_COSINE fix typos in Choose_INT8_L2_implementation_AVX512F_BW_VL_VNNI name * fix INITIALIZE_BENCHMARKS_SET_L2_IP and add include to F_BW_VL_VNNI * rename unit/test_utuils to unit_test_utils * seed create vec * format * implmenet IP + unit test * ip bm * format * implement cosine in ip API change create_int8_vec to populate_int8_vec add compute norm * use mask sub instead of msk load * loop size = 512 minimal dim = 32 * add int8 to bm * reanme to simd64 * convert to int before multiplication * introduce IntegralType_ComputeNorm * move preprocessor logic to choose if cosine preprocessor is needed to CreateIndexComponents: pass bool is_normalized get distnce function according to original metric get pp according to is_normalized && metric == VecSimMetric_Cosine, and remove this logic from the indexes factories. add dataSize member to AbstractIndexInitParams add VecSimType_INT8 type introduce VecSimParams_GetDataSize: returns datasize introduce and implement GetNormalizeFunc<int8_t> thtat returns int8_normalizeVector int8_normalizeVector computes the norm and stores it at the emd of argument vector. * add int8 tests * fix include unint_test_utils * add int 8 to index factories remove normalize_func from VecSimIndexAbstract members tests: int8 unit test create int8 indexes unit_test_utils: CalcIndexDataSize: casts VecSimIndex * to VecSimIndexAbstract<dist_t, data_t> * and calls VecSimIndexAbstract<dist_t, data_t>::getDataSize() cast_to_tiered_index<data_t, dist_t>: takes VecSimIndex * ans casts to TieredHNSWIndex<data_t, dist_t> * * add EstimateInitialSize for int8 to indexes factories 2 new function to test_utils:: CreateTieredParams CreateNewTieredHNSWIndex add test_initial_size_estimation to CommonTypeMetricTests use CommonTypeMetricTieredTests for tiered tests * add int8 unit tests add int8 to * VecSimDebug_GetElementNeighborsInHNSWGraph * VecSim_Normalize *HNSW NewIndex from file * remove duplicated GetDistFunc<int8_t, float> move ASSERT_DEBUG_DEATH of CalcIndexDataSize to a separate test * remove assert test, the statement is excuted and causes crash * imporve normalize test * rename test_utils::compute_norm -> test_utils::integral_compute_norm remove test_normalize.cpp file * use stack allocation instead of heap allocation in tests * fix float comparison in test_serialization avoid evaluating statement in typeid to avoid clang warnig * renae CalcIndexDataSize -> CalcVectorDataSize move components tests from test_common to test_components * add comment to INSTANTIATE_TEST_SUITE_P * [MOD-8206] INT8 flow tests (#573) * test_hnsw.py intiital * int8 hnsw tests * general tests class * flow_bruteforce.py: introduce GeneralTest call from TestINT8 common.py: introduce create_flat_index create_add_vectors move fp32_expand_and_calc_cosine_dist to common.py * tiered flow tests: * add optional create_data_func to IndexCtx, use for special datatypes *inntroduce test_create_int8 and test_search_insert_int8 create_int8_vectors expectes shape (tuple) * use query.flat * revert using flat (not helping in int8) fix float16 calling query.flat * revert changes in Data class in bf tests revert test_bf_float16_range_query change * fix merge

* [MOD-8198] Introduce INT8 distance functions (#560) * naive implementation of L2 * update * implment naive disatnce for int8 add cosine to spaces fix typos in calculator * imp choose L2 int8 with 256bit loop add spaces unit tests for int8 L2 add compilation flags introduce tests/utils for general utils * imp space bm for int8 change INITIALIZE_BENCHMARKS_SET to INITIALIZE_BENCHMARKS_SET_L2_IP introduce INITIALIZE_BENCHMARKS_SET_COSINE fix typos in Choose_INT8_L2_implementation_AVX512F_BW_VL_VNNI name * fix INITIALIZE_BENCHMARKS_SET_L2_IP and add include to F_BW_VL_VNNI * rename unit/test_utuils to unit_test_utils * seed create vec * format * implmenet IP + unit test * ip bm * format * implement cosine in ip API change create_int8_vec to populate_int8_vec add compute norm * use mask sub instead of msk load * loop size = 512 minimal dim = 32 * add int8 to bm * reanme to simd64 * convert to int before multiplication * review comments: align to vector size ncluding the norm in cosine dist unit test cover small dim in cosine chooser * use sizeof(float)instead of 4 * remove int conversion in test_utils::compute_norm * REVERT!!! malicious test to see if we get to the code * assert dummt * fix alignemnt test * remove assert * remove cosine alignment * Override missing intrinsincs in gcc <11 (#572) * override _mm256_loadu_epi8 with mm256_maskz_loadu_epi8 if gcc < 11 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95483 * fix * disable flow temp * add comment * [MOD-8200] [MOD-8202] INT8 index (#566) * naive implementation of L2 * update * implment naive disatnce for int8 add cosine to spaces fix typos in calculator * imp choose L2 int8 with 256bit loop add spaces unit tests for int8 L2 add compilation flags introduce tests/utils for general utils * imp space bm for int8 change INITIALIZE_BENCHMARKS_SET to INITIALIZE_BENCHMARKS_SET_L2_IP introduce INITIALIZE_BENCHMARKS_SET_COSINE fix typos in Choose_INT8_L2_implementation_AVX512F_BW_VL_VNNI name * fix INITIALIZE_BENCHMARKS_SET_L2_IP and add include to F_BW_VL_VNNI * rename unit/test_utuils to unit_test_utils * seed create vec * format * implmenet IP + unit test * ip bm * format * implement cosine in ip API change create_int8_vec to populate_int8_vec add compute norm * use mask sub instead of msk load * loop size = 512 minimal dim = 32 * add int8 to bm * reanme to simd64 * convert to int before multiplication * introduce IntegralType_ComputeNorm * move preprocessor logic to choose if cosine preprocessor is needed to CreateIndexComponents: pass bool is_normalized get distnce function according to original metric get pp according to is_normalized && metric == VecSimMetric_Cosine, and remove this logic from the indexes factories. add dataSize member to AbstractIndexInitParams add VecSimType_INT8 type introduce VecSimParams_GetDataSize: returns datasize introduce and implement GetNormalizeFunc<int8_t> thtat returns int8_normalizeVector int8_normalizeVector computes the norm and stores it at the emd of argument vector. * add int8 tests * fix include unint_test_utils * add int 8 to index factories remove normalize_func from VecSimIndexAbstract members tests: int8 unit test create int8 indexes unit_test_utils: CalcIndexDataSize: casts VecSimIndex * to VecSimIndexAbstract<dist_t, data_t> * and calls VecSimIndexAbstract<dist_t, data_t>::getDataSize() cast_to_tiered_index<data_t, dist_t>: takes VecSimIndex * ans casts to TieredHNSWIndex<data_t, dist_t> * * add EstimateInitialSize for int8 to indexes factories 2 new function to test_utils:: CreateTieredParams CreateNewTieredHNSWIndex add test_initial_size_estimation to CommonTypeMetricTests use CommonTypeMetricTieredTests for tiered tests * add int8 unit tests add int8 to * VecSimDebug_GetElementNeighborsInHNSWGraph * VecSim_Normalize *HNSW NewIndex from file * remove duplicated GetDistFunc<int8_t, float> move ASSERT_DEBUG_DEATH of CalcIndexDataSize to a separate test * remove assert test, the statement is excuted and causes crash * imporve normalize test * rename test_utils::compute_norm -> test_utils::integral_compute_norm remove test_normalize.cpp file * use stack allocation instead of heap allocation in tests * fix float comparison in test_serialization avoid evaluating statement in typeid to avoid clang warnig * renae CalcIndexDataSize -> CalcVectorDataSize move components tests from test_common to test_components * add comment to INSTANTIATE_TEST_SUITE_P * [MOD-8206] INT8 flow tests (#573) * test_hnsw.py intiital * int8 hnsw tests * general tests class * flow_bruteforce.py: introduce GeneralTest call from TestINT8 common.py: introduce create_flat_index create_add_vectors move fp32_expand_and_calc_cosine_dist to common.py * tiered flow tests: * add optional create_data_func to IndexCtx, use for special datatypes *inntroduce test_create_int8 and test_search_insert_int8 create_int8_vectors expectes shape (tuple) * use query.flat * revert using flat (not helping in int8) fix float16 calling query.flat * revert changes in Data class in bf tests revert test_bf_float16_range_query change * fix merge (cherry picked from commit babfbe0)

[MOD-8198] Introduce INT8 (#560) (#571) * [MOD-8198] Introduce INT8 distance functions (#560) * naive implementation of L2 * update * implment naive disatnce for int8 add cosine to spaces fix typos in calculator * imp choose L2 int8 with 256bit loop add spaces unit tests for int8 L2 add compilation flags introduce tests/utils for general utils * imp space bm for int8 change INITIALIZE_BENCHMARKS_SET to INITIALIZE_BENCHMARKS_SET_L2_IP introduce INITIALIZE_BENCHMARKS_SET_COSINE fix typos in Choose_INT8_L2_implementation_AVX512F_BW_VL_VNNI name * fix INITIALIZE_BENCHMARKS_SET_L2_IP and add include to F_BW_VL_VNNI * rename unit/test_utuils to unit_test_utils * seed create vec * format * implmenet IP + unit test * ip bm * format * implement cosine in ip API change create_int8_vec to populate_int8_vec add compute norm * use mask sub instead of msk load * loop size = 512 minimal dim = 32 * add int8 to bm * reanme to simd64 * convert to int before multiplication * review comments: align to vector size ncluding the norm in cosine dist unit test cover small dim in cosine chooser * use sizeof(float)instead of 4 * remove int conversion in test_utils::compute_norm * REVERT!!! malicious test to see if we get to the code * assert dummt * fix alignemnt test * remove assert * remove cosine alignment * Override missing intrinsincs in gcc <11 (#572) * override _mm256_loadu_epi8 with mm256_maskz_loadu_epi8 if gcc < 11 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95483 * fix * disable flow temp * add comment * [MOD-8200] [MOD-8202] INT8 index (#566) * naive implementation of L2 * update * implment naive disatnce for int8 add cosine to spaces fix typos in calculator * imp choose L2 int8 with 256bit loop add spaces unit tests for int8 L2 add compilation flags introduce tests/utils for general utils * imp space bm for int8 change INITIALIZE_BENCHMARKS_SET to INITIALIZE_BENCHMARKS_SET_L2_IP introduce INITIALIZE_BENCHMARKS_SET_COSINE fix typos in Choose_INT8_L2_implementation_AVX512F_BW_VL_VNNI name * fix INITIALIZE_BENCHMARKS_SET_L2_IP and add include to F_BW_VL_VNNI * rename unit/test_utuils to unit_test_utils * seed create vec * format * implmenet IP + unit test * ip bm * format * implement cosine in ip API change create_int8_vec to populate_int8_vec add compute norm * use mask sub instead of msk load * loop size = 512 minimal dim = 32 * add int8 to bm * reanme to simd64 * convert to int before multiplication * introduce IntegralType_ComputeNorm * move preprocessor logic to choose if cosine preprocessor is needed to CreateIndexComponents: pass bool is_normalized get distnce function according to original metric get pp according to is_normalized && metric == VecSimMetric_Cosine, and remove this logic from the indexes factories. add dataSize member to AbstractIndexInitParams add VecSimType_INT8 type introduce VecSimParams_GetDataSize: returns datasize introduce and implement GetNormalizeFunc<int8_t> thtat returns int8_normalizeVector int8_normalizeVector computes the norm and stores it at the emd of argument vector. * add int8 tests * fix include unint_test_utils * add int 8 to index factories remove normalize_func from VecSimIndexAbstract members tests: int8 unit test create int8 indexes unit_test_utils: CalcIndexDataSize: casts VecSimIndex * to VecSimIndexAbstract<dist_t, data_t> * and calls VecSimIndexAbstract<dist_t, data_t>::getDataSize() cast_to_tiered_index<data_t, dist_t>: takes VecSimIndex * ans casts to TieredHNSWIndex<data_t, dist_t> * * add EstimateInitialSize for int8 to indexes factories 2 new function to test_utils:: CreateTieredParams CreateNewTieredHNSWIndex add test_initial_size_estimation to CommonTypeMetricTests use CommonTypeMetricTieredTests for tiered tests * add int8 unit tests add int8 to * VecSimDebug_GetElementNeighborsInHNSWGraph * VecSim_Normalize *HNSW NewIndex from file * remove duplicated GetDistFunc<int8_t, float> move ASSERT_DEBUG_DEATH of CalcIndexDataSize to a separate test * remove assert test, the statement is excuted and causes crash * imporve normalize test * rename test_utils::compute_norm -> test_utils::integral_compute_norm remove test_normalize.cpp file * use stack allocation instead of heap allocation in tests * fix float comparison in test_serialization avoid evaluating statement in typeid to avoid clang warnig * renae CalcIndexDataSize -> CalcVectorDataSize move components tests from test_common to test_components * add comment to INSTANTIATE_TEST_SUITE_P * [MOD-8206] INT8 flow tests (#573) * test_hnsw.py intiital * int8 hnsw tests * general tests class * flow_bruteforce.py: introduce GeneralTest call from TestINT8 common.py: introduce create_flat_index create_add_vectors move fp32_expand_and_calc_cosine_dist to common.py * tiered flow tests: * add optional create_data_func to IndexCtx, use for special datatypes *inntroduce test_create_int8 and test_search_insert_int8 create_int8_vectors expectes shape (tuple) * use query.flat * revert using flat (not helping in int8) fix float16 calling query.flat * revert changes in Data class in bf tests revert test_bf_float16_range_query change * fix merge (cherry picked from commit babfbe0) Co-authored-by: meiravgri <109056284+meiravgri@users.noreply.github.com>

Base automatically changed from meiravg_compute_norm to meiravg_feature_int_uint_8 December 22, 2024 10:34

meiravgri added 3 commits December 22, 2024 12:35

test_hnsw.py intiital

5bd62ef

int8 hnsw tests

1aeaa37

general tests class

2c463bd

meiravgri force-pushed the meiravg_int8_bindings branch from 45aa7f4 to 2c463bd Compare December 22, 2024 12:35

meiravgri added 5 commits December 22, 2024 16:44

flow_bruteforce.py:

adae175

introduce GeneralTest call from TestINT8 common.py: introduce create_flat_index create_add_vectors move fp32_expand_and_calc_cosine_dist to common.py

tiered flow tests:

25b3b7a

* add optional create_data_func to IndexCtx, use for special datatypes *inntroduce test_create_int8 and test_search_insert_int8 create_int8_vectors expectes shape (tuple)

use query.flat

6fd0aed

revert using flat (not helping in int8)

358361a

fix float16 calling query.flat

revert changes in Data class in bf tests

9ac3449

revert test_bf_float16_range_query change

meiravgri requested a review from alonre24 December 23, 2024 08:44

meiravgri changed the title ~~Meiravg_int8_bindings~~ [MOD-8206] INT8 bindings Dec 23, 2024

meiravgri changed the title ~~[MOD-8206] INT8 bindings~~ [MOD-8206] INT8 flow tests Dec 23, 2024

meiravgri requested a review from GuyAv46 December 23, 2024 09:51

GuyAv46 approved these changes Dec 23, 2024

View reviewed changes

meiravgri merged commit 0cc4709 into meiravg_feature_int_uint_8 Dec 23, 2024
16 checks passed

meiravgri deleted the meiravg_int8_bindings branch December 23, 2024 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MOD-8206] INT8 flow tests #573

[MOD-8206] INT8 flow tests #573

Uh oh!

meiravgri commented Dec 19, 2024 •

edited

Loading

Uh oh!

codecov bot commented Dec 19, 2024 •

edited

Loading

Uh oh!

GuyAv46 left a comment

Uh oh!

GuyAv46 Dec 23, 2024

Uh oh!

meiravgri Dec 23, 2024

Uh oh!

Uh oh!

Uh oh!


		return BFIndex(bfparams)

		def create_add_vectors(index, vectors):

[MOD-8206] INT8 flow tests #573

[MOD-8206] INT8 flow tests #573

Uh oh!

Conversation

meiravgri commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Generalized Test Classes

Bindings support

New Helper Functions

Additional Improvements

Uh oh!

codecov bot commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

GuyAv46 left a comment

Choose a reason for hiding this comment

Uh oh!

GuyAv46 Dec 23, 2024

Choose a reason for hiding this comment

Uh oh!

meiravgri Dec 23, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

meiravgri commented Dec 19, 2024 •

edited

Loading

codecov bot commented Dec 19, 2024 •

edited

Loading